An importance matrix (imatrix) is calibration data derived by running a model over a representative text corpus. It records which weights matter most for accurate predictions, and the quantizer uses this information to allocate precision where it reduces the most loss. imatrix is supported for all quant types except bitnet. Its effect is most significant at lower bit levels — for quants belowDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/ikawrakow/ik_llama.cpp/llms.txt
Use this file to discover all available pages before exploring further.
Q6_0, using an imatrix is strongly recommended.
Generating an imatrix
Prepare a calibration dataset
You need a plain-text file containing representative text. A common choice is a deduplicated sample from the same domain the model will be used on.
Options
—layer-similarity
Collect cosine-similarity statistics that measure how much each layer changes its activations:—hide-imatrix
Obscure the imatrix provenance in the output file. When this flag is set,llama-imatrix stores top_secret in the data file name and calibration dataset fields, and writes zeros for the batch size and number of chunks:
Using an imatrix when quantizing
Pass--imatrix to llama-quantize:
--imatrix for best quality results. Omitting it on quants below Q6_0 will noticeably degrade output quality.
Converting GGUF imatrix files
Some imatrix files are distributed in the newer GGUF format rather than the.dat format used by ik_llama.cpp. Convert them with the included script:
Verifying imatrix usage
To confirm that a downloaded GGUF was quantized with an imatrix, inspect its metadata. Look for fields with thequantize.imatrix.* prefix — their presence confirms an imatrix was applied during quantization.
You can view metadata with gguf_dump.py: